Dual-microphone Voice Activity Detection Incorporating Gaussian Mixture Models with an Error Correction Scheme in Non-stationary Noise Environments

نویسندگان

  • Ji Hun Park
  • Hong Kook Kim
چکیده

In this paper, a voice activity detection (VAD) method is proposed based on Gaussian mixture models (GMMs) by exploiting the spatial selectivity in dual-microphone environments. In other words, each GMM is constructed according to the direction-ofarrival (DOA) to detect speech intervals. Based on the assumption that the target speech is located in front of dual-microphones, the VAD is performed by comparing the likelihood obtained from the GMM constructed for the front of the microphones with those obtained from GMMs for other DOAs. In addition, to mitigate false rejection errors of VAD arising from the low spatial correlation in unvoiced intervals of target speech, VAD results are refined by employing a VAD error correction scheme. The error correction scheme analyzes the ratio between the energy of high and low frequency bands (HILO) to discriminate between an unvoiced interval of speech and a non-speech interval. The performance of the proposed GMM-based VAD method with the HILO-based error correction scheme is evaluated by measuring the false alarm rate (FAR) and false rejection rate (FRR) and comparing them with those of conventional dual-microphone VAD methods, where the FAR and FRR are measured by comparing the VAD results of each VAD method with those of manual segmentation. It is shown from the evaluation that the proposed GMM-based VAD method with the HILO-based VAD error correction outperforms a Gaussian kernel density-based VAD method and a GMM-based VAD method without VAD correction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Voice-based Age and Gender Recognition using Training Generative Sparse Model

Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...

متن کامل

Speech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering

Gaussian Mixture Models (GMMs) of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD) or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equatio...

متن کامل

Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization

This paper proposes a frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection (VAD). Our previous work, switching Kalman filter-based VAD, sequentially estimates a non-stationary noise Gaussian mixture model (GMM) and constructs GMMs of observed noisy speech signals by composing pre-trained silence and clean GMMs and se...

متن کامل

A model based voice activity detector for noisy environments

This paper presents a model-based voice activity detector (VAD) aimed at operating in low signal to noise ratio conditions and non-stationary noise environments. The proposed system makes use of Gaussian mixture models trained on Mel Frequency Cepstral Coefficients extracted from noisy speech data. In addition, information from smoothed frame based log energy is used to augment the system to de...

متن کامل

On the use of Machine Learning Methods for Speech and Voicing Classification

This work examines the effectiveness of machine learning (ML) classifiers on the problems of voice activity detection and voicing classification. A wide range of ML classifiers are considered and include parametric, probabilistic and non-probabilistic, artificial neural networks and regression. Evaluations are carried out in both stationary and non-stationary noise types at signal-to-noise rati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013